Machine Learning-based Framework Develops a Tumor Thrombus Coagulation Signature in Multicenter Cohorts for Renal Cancer

Background: Renal cell carcinoma (RCC) is frequently accompanied by tumor thrombus in the venous system with an extremely dismal prognosis. The current Tumor Node Metastasis (TNM) stage and Mayo clinical classification do not appropriately identify preference-sensitive treatment. Therefore, there is an urgent need to develop a better ideal model for precision medicine. Methods: In this study, we developed a coagulation tumor thrombus signature for RCC with 10 machine-learning algorithms (101 combinations) based on a novel computational framework using multiple independent cohorts. Results: The established tumor thrombus coagulation-related risk stratification (TTCRRS) signature comprises 10 prognostic coagulation-related genes (CRGs). This signature could predict survival outcomes in public and in-house protein cohorts and showed high performance compared to 129 published signatures. Additionally, the TTCRRS signature was significantly related to some immune landscapes, immunotherapy response, and chemotherapy. Furthermore, we also screened out hub genes, transcription factors, and small compounds based on the TTCRRS signature. Meanwhile, CYP51A1 can regulate the proliferation and migration properties of RCC. Conclusions: The TTCRRS signature can complement the traditional anatomic TNM staging system and Mayo clinical stratification and provide clinicians with more therapeutic options.


Supplemental Figures:
Because some tables are too big, so we provide individually in excel files.

Figure
Figure S1 Mutation landscapes of TTCRGs in RCC patients.(A) Somatic mutations of some top genes in the OncoPrint.(B) Histogram of the proportion of different mutation groups in RCC.(C-D) RCC patients' overall survival and relapse-free survival between altered and unaltered mutation groups.

Figure
Figure S2 Unsupervised cluster analysis of the RCC patients based on 3 algorithms (CC, SNFCC and CNMF).(A-C) Average silhouette width plots represent the coherence of clusters.Similar samples in each cluster via 3 algorithms are gathered and a high value of average silhouette width means the correlation between

Figure
Figure S3 Immune characteristics between two clusters in different independent cohorts.(A-C) The Correlation between the composition of the TME, expression of gene signatures related to the functional orientation of the immune TME, expression of genes related to immune checkpoints defined by the MCP-counter Z-scores and TTC clusters in the ICGC, E-MTAB-1980, and CM-025 cohorts.Adjusted P values are obtained from Benjamini-Hochberg correction of two-sided Kruskal-Wallis tests P values.(D) Heatmap shows RCC patients' immune profiles across TTC subtypes.The top panel shows the expression of genes involved in immune checkpoint targets and the bottom panel shows the enrichment

Figure
Figure S4 Clinical significance and immune landscape of TTCRRS subtypes in the TCGA cohort.(A) The biological pathways of two TTC clusters inferred with the GSVA algorithm.Red represents the activation of biological pathways and blue represents the inhibition of biological pathways.(B) The significant enrichment of biological pathways inferred with the GSEA algorithm.Genes are ranked by logFC of two TTC clusters.(C-D) GSEA analysis delineated the MeSH terms associated with TTCRRS by using terms of gendoo

Figure
Figure S5 The assessment of two TTC clusters in immune characteristics and chemotherapy.(A,C) The TMB score and TIDE score between two TTC clusters.(B) The expression of immune checkpoint molecules in two TTC clusters.(D) The chemotherapy response of two TTC clusters for eight frequently used drugs.

Figure
Figure S6 The procedure of WGCNA analysis.(A) Clustering analysis of samples from the GSE48000 cohort based on the mRNA expression profile to detect outliers.Each branch represents a sample, and the y-axis represents the cluster distance.(B) Analysis of the scale-free fit index (R2) and the mean connectivity with different soft-thresholding powers in the GSE48000 cohort.(C) Gene dendrograms obtained by average linkage hierarchical clustering in the GSE48000 cohort.The modules of expressed genes were assigned colors and numbers as indicated by the horizontal bar beneath each dendrogram (dynamic tree cut).(D) Hierarchical cluster dendrograms and heatmaps of the correlation between ME values and VTE.

Figure
Figure S7The process of TTCRGs.

Figure
Figure S8 Establishment of the TTCRRS signatures in multiple cohorts.(A-F) The distribution of TTCRRS signature, the vital status of patients, and the expression of TTCRGs in the TCGA-KIRC, ICGC, GSE167573, E-MTAB-1980, CM-025 and Meta cohorts.

Figure
Figure S9 Evaluation of the TTCRRS signature.(A) Time-dependent ROC analysis for predicting OS at 1, 3, and 5 years across all cohorts.

Figure
Figure S10The assessment of two TTCRRS subtypes in chemotherapy for eight common drugs.

Figure
Figure S11 Function annotation of the TTCRRS signature based on the Meta-cohort.(A) The relationship between two TTCRRS subtypes and 28 immune cell infiltrations.(B) Scatterplots between CD8A and PD0-1, PD-L1, and CTLA4 with the TTCRRS were shown in the Meta-cohort.(C) Violin plot shows the relationship between 28 immune cell infiltrations and the TTCRRS subtype.(D) Difference in pathway activities scored per patient by GSVA between high-and low-TTCRRS.Shown are t values from a linear model.(E) Butterfly plot illustrates the correlation between the TTCRRS and metabolic pathways, the enrichment pathways based on GSVA of GO and KEGG terms.

Figure
Figure S12 The correlation of TTCRRS with antitumor immunity.(A-D) Scatterplots show the correlation between TTCRRS and CD8A, PD-L1, CTLA4 and PD-1.(E-J)The violin plots depict the association between some immune response signatures(CXCR3, CCl5,

Figure
Figure S14The 181 immune cell infiltrations between two TTCRRS subtypes.

Figure
Figure S15The 181 immune cell infiltrations between two TTCRRS subtypes.

Figure
Figure S16The 181 immune cell infiltrations between two TTCRRS subtypes.

Figure
Figure S18 Identification of hub gene-related Transcription factors (TFs).(A, D) CYP51A1 and PSRC1 were regulated by those most likely TFs in human cancers.(B) The functional state of CYP51A1 in ccRCC based on CancerSEA.The red plots indicated that CYP51A1 is positively correlated with the functional state while the blue plots indicated that CYP51A1 was negatively correlated with the functional state identified by CancerSEA.(C) Single-cell analysis indicated that CYP51A1 is primarily involved in Hypoxia in ccRCC and the correlation between CYP51A1 and Hypoxia is shown in the scatter plot.